108 research outputs found

    Characterisation of Inactivation Domains and Evolutionary Strata in Human X Chromosome through Markov Segmentation

    Get PDF
    Markov segmentation is a method of identifying compositionally different subsequences in a given symbolic sequence. We have applied this technique to the DNA sequence of the human X chromosome to analyze its compositional structure. The human X chromosome is known to have acquired DNA through distinct evolutionary events and is believed to be composed of five evolutionary strata. In addition, in female mammals all copies of X chromosome in excess of one are transcriptionally inactivated. The location of a gene is correlated with its ability to undergo inactivation, but correlations between evolutionary strata and inactivation domains are less clear. Our analysis provides an accurate estimate of the location of stratum boundaries and gives a high–resolution map of compositionally different regions on the X chromosome. This leads to the identification of a novel stratum, as well as segments wherein a group of genes either undergo inactivation or escape inactivation in toto. We identify oligomers that appear to be unique to inactivation domains alone

    Genomic characterization of five deletions in the LDL receptor gene in Danish Familial Hypercholesterolemic subjects

    Get PDF
    BACKGROUND: Familial Hypercholesterolemia is a common autosomal dominantly inherited disease that is most frequently caused by mutations in the gene encoding the receptor for low density lipoproteins (LDLR). Deletions and other major structural rearrangements of the LDLR gene account for approximately 5% of the mutations in many populations. METHODS: Five genomic deletions in the LDLR gene were characterized by amplification of mutated alleles and sequencing to identify genomic breakpoints. A diagnostic assay based on duplex PCR for the exon 7 – 8 deletion was developed to discriminate between heterozygotes and normals, and bioinformatic analyses were used to identify interspersed repeats flanking the deletions. RESULTS: In one case 15 bp had been inserted at the site of the deleted DNA, and, in all five cases, Alu elements flanked the sites where deletions had occurred. An assay developed to discriminate the wildtype and the deletion allele in a simple duplex PCR detected three FH patients as heterozygotes, and two individuals with normal lipid values were detected as normal homozygotes. CONCLUSION: The identification of the breakpoints should make it possible to develop specific tests for these mutations, and the data provide further evidence for the role of Alu repeats in intragenic deletions

    A new measure for functional similarity of gene products based on Gene Ontology

    Get PDF
    BACKGROUND: Gene Ontology (GO) is a standard vocabulary of functional terms and allows for coherent annotation of gene products. These annotations provide a basis for new methods that compare gene products regarding their molecular function and biological role. RESULTS: We present a new method for comparing sets of GO terms and for assessing the functional similarity of gene products. The method relies on two semantic similarity measures; sim(Rel )and funSim. One measure (sim(Rel)) is applied in the comparison of the biological processes found in different groups of organisms. The other measure (funSim) is used to find functionally related gene products within the same or between different genomes. Results indicate that the method, in addition to being in good agreement with established sequence similarity approaches, also provides a means for the identification of functionally related proteins independent of evolutionary relationships. The method is also applied to estimating functional similarity between all proteins in Saccharomyces cerevisiae and to visualizing the molecular function space of yeast in a map of the functional space. A similar approach is used to visualize the functional relationships between protein families. CONCLUSION: The approach enables the comparison of the underlying molecular biology of different taxonomic groups and provides a new comparative genomics tool identifying functionally related gene products independent of homology. The proposed map of the functional space provides a new global view on the functional relationships between gene products or protein families

    Specific and Sensitive Detection of H. pylori in Biological Specimens by Real-Time RT-PCR and In Situ Hybridization

    Get PDF
    PCR detection of H. pylori in biological specimens is rendered difficult by the extensive polymorphism of H. pylori genes and the suppressed expression of some genes in many strains. The goal of the present study was to (1) define a domain of the 16S rRNA sequence that is both highly conserved among H. pylori strains and also specific to the species, and (2) to develop and validate specific and sensitive molecular methods for the detection of H. pylori. We used a combination of in silico and molecular approaches to achieve sensitive and specific detection of H. pylori in biologic media. We sequenced two isolates from patients living in different continents and demonstrated that a 546-bp domain of the H. pylori 16S rRNA sequence was conserved in those strains and in published sequences. Within this conserved sequence, we defined a 229-bp domain that is 100% homologous in most H. pylori strains available in GenBank and also is specific for H. pylori. This sub-domain was then used to design (1) a set of high quality RT-PCR primers and probe that encompassed a 76-bp sequence and included at least two mismatches with other Helicobacter sp. 16S rRNA; and (2) in situ hybridization antisense probes. The sensitivity and specificity of the approaches were then demonstrated by using gastric biopsy specimens from patients and rhesus monkeys. This H. pylori-specific region of the 16S rRNA sequence is highly conserved among most H. pylori strains and allows specific detection, identification, and quantification of this bacterium in biological specimens

    Incorporating Distant Sequence Features and Radial Basis Function Networks to Identify Ubiquitin Conjugation Sites

    Get PDF
    Ubiquitin (Ub) is a small protein that consists of 76 amino acids about 8.5 kDa. In ubiquitin conjugation, the ubiquitin is majorly conjugated on the lysine residue of protein by Ub-ligating (E3) enzymes. Three major enzymes participate in ubiquitin conjugation. They are – E1, E2 and E3 which are responsible for activating, conjugating and ligating ubiquitin, respectively. Ubiquitin conjugation in eukaryotes is an important mechanism of the proteasome-mediated degradation of a protein and regulating the activity of transcription factors. Motivated by the importance of ubiquitin conjugation in biological processes, this investigation develops a method, UbSite, which uses utilizes an efficient radial basis function (RBF) network to identify protein ubiquitin conjugation (ubiquitylation) sites. This work not only investigates the amino acid composition but also the structural characteristics, physicochemical properties, and evolutionary information of amino acids around ubiquitylation (Ub) sites. With reference to the pathway of ubiquitin conjugation, the substrate sites for E3 recognition, which are distant from ubiquitylation sites, are investigated. The measurement of F-score in a large window size (βˆ’20∼+20) revealed a statistically significant amino acid composition and position-specific scoring matrix (evolutionary information), which are mainly located distant from Ub sites. The distant information can be used effectively to differentiate Ub sites from non-Ub sites. As determined by five-fold cross-validation, the model that was trained using the combination of amino acid composition and evolutionary information performs best in identifying ubiquitin conjugation sites. The prediction sensitivity, specificity, and accuracy are 65.5%, 74.8%, and 74.5%, respectively. Although the amino acid sequences around the ubiquitin conjugation sites do not contain conserved motifs, the cross-validation result indicates that the integration of distant sequence features of Ub sites can improve predictive performance. Additionally, the independent test demonstrates that the proposed method can outperform other ubiquitylation prediction tools

    Molecular diversity of phospholipase D in angiosperms

    Get PDF
    BACKGROUND: The phospholipase D (PLD) family has been identified in plants by recent molecular studies, fostered by the emerging importance of plant PLDs in stress physiology and signal transduction. However, the presence of multiple isoforms limits the power of conventional biochemical and pharmacological approaches, and calls for a wider application of genetic methodology. RESULTS: Taking advantage of sequence data available in public databases, we attempted to provide a prerequisite for such an approach. We made a complete inventory of the Arabidopsis thaliana PLD family, which was found to comprise 12 distinct genes. The current nomenclature of Arabidopsis PLDs was refined and expanded to include five newly described genes. To assess the degree of plant PLD diversity beyond Arabidopsis we explored data from rice (including the genome draft by Monsanto) as well as cDNA and EST sequences from several other plants. Our analysis revealed two major PLD subfamilies in plants. The first, designated C2-PLD, is characterised by presence of the C2 domain and comprises previously known plant PLDs as well as new isoforms with possibly unusual features-catalytically inactive or independent on Ca(2+). The second subfamily (denoted PXPH-PLD) is novel in plants but is related to animal and fungal enzymes possessing the PX and PH domains. CONCLUSIONS: The evolutionary dynamics, and inter-specific diversity, of plant PLDs inferred from our phylogenetic analysis, call for more plant species to be employed in PLD research. This will enable us to obtain generally valid conclusions

    Large-Scale Evidence for Conservation of NMD Candidature Across Mammals

    Get PDF
    BACKGROUND: Alternatively-spliced (AS) forms can vary protein function, intracellular localization and post-translational modifications. AS coupled with mRNA nonsense-mediated decay (NMD) can also control the transcript abundance. Here, we have investigated the genome-scale conservation of alternatively-spliced NMD candidates (AS-NMD candidates), in mammals. METHODOLOGY/PRINCIPAL FINDINGS: We mapped>12 million cDNA/EST library transcripts, comprising pooled data from both older and next-generation sequencing techniques, against genomic sequences to annotate AS-NMD candidates generated by in-frame premature termination codons (PTCs), in the human, mouse, rat and cow genomes. In these genomes, we found populations of genes that harbour AS-NMD candidates, varying in number from approximately 149 to 2,051 genes. We discovered that a highly-significant proportion (27%-35%) of AS-NMD candidate genes in mouse, rat and cow, also have human orthologs targeted for NMD. Intron retention was the most abundant type of AS-NMD, ranging from 43% to 67% of genes harbouring an AS-NMD candidate. Groupings of AS-NMD candidate genes either with or without intron retentions also have highly significant AS-NMD conservation, indicating that the trend is not due primarily to conservation of intron retentions. As a subset, the AS-NMD intron retentions are distinguished from non-retained introns by higher GC content, and codon usage similar to the usage in protein-coding sequences. This indicates that most of these alternatively spliced sequences have coded for proteins in the recent evolutionary past. In general, the AS-NMD candidate genes showed a similar pattern of Gene Ontology functional category enrichments in all four species. Genes linked to nucleic-acid interaction and apoptosis, and involved in pathways linked with cancer, were the most common. Finally, we mapped the AS-NMD candidates to mass spectrometry-derived proteomics data, and gathered evidence of truncated polypeptides for at least 10% of all human AS-NMD candidate transcripts. CONCLUSIONS/SIGNIFICANCE: In summary, our analysis provides strong statistical evidence for conservation of functional AS-NMD candidature across Mammalia for a large subset of genes. However, because codon usage of AS-NMD intron retentions is similar to the usage in exons, it is difficult to de-couple conservation of AS-NMD-based regulation from conservation for protein-coding ability, for intron retentions

    Sequences, Annotation and Single Nucleotide Polymorphism of the Major Histocompatibility Complex in the Domestic Cat

    Get PDF
    Two sequences of major histocompatibility complex (MHC) regions in the domestic cat, 2.976 and 0.362 Mbps, which were separated by an ancient chromosome break (55–80 MYA) and followed by a chromosomal inversion were annotated in detail. Gene annotation of this MHC was completed and identified 183 possible coding regions, 147 human homologues, possible functional genes and 36 pseudo/unidentified genes) by GENSCAN and BLASTN, BLASTP RepeatMasker programs. The first region spans 2.976 Mbp sequence, which encodes six classical class II antigens (three DRA and three DRB antigens) lacking the functional DP, DQ regions, nine antigen processing molecules (DOA/DOB, DMA/DMB, TAPASIN, and LMP2/LMP7,TAP1/TAP2), 52 class III genes, nineteen class I genes/gene fragments (FLAI-A to FLAI-S). Three class I genes (FLAI-H, I-K, I-E) may encode functional classical class I antigens based on deduced amino acid sequence and promoter structure. The second region spans 0.362 Mbp sequence encoding no class I genes and 18 cross-species conserved genes, excluding class I, II and their functionally related/associated genes, namely framework genes, including three olfactory receptor genes. One previously identified feline endogenous retrovirus, a baboon retrovirus derived sequence (ECE1) and two new endogenous retrovirus sequences, similar to brown bat endogenous retrovirus (FERVmlu1, FERVmlu2) were found within a 140 Kbp interval in the middle of class I region. MHC SNPs were examined based on comparisons of this BAC sequence and MHC homozygous 1.9Γ— WGS sequences and found that 11,654 SNPs in 2.84 Mbp (0.00411 SNP per bp), which is 2.4 times higher rate than average heterozygous region in the WGS (0.0017 SNP per bp genome), and slightly higher than the SNP rate observed in human MHC (0.00337 SNP per bp)

    The FGGY carbohydrate kinase family : insights into the evolution of functional specificities

    Get PDF
    Β© The Author(s), 2011. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in PLoS Computational Biology 7 (2011): e1002318, doi:10.1371/journal.pcbi.1002318.Function diversification in large protein families is a major mechanism driving expansion of cellular networks, providing organisms with new metabolic capabilities and thus adding to their evolutionary success. However, our understanding of the evolutionary mechanisms of functional diversity in such families is very limited, which, among many other reasons, is due to the lack of functionally well-characterized sets of proteins. Here, using the FGGY carbohydrate kinase family as an example, we built a confidently annotated reference set (CARS) of proteins by propagating experimentally verified functional assignments to a limited number of homologous proteins that are supported by their genomic and functional contexts. Then, we analyzed, on both the phylogenetic and the molecular levels, the evolution of different functional specificities in this family. The results show that the different functions (substrate specificities) encoded by FGGY kinases have emerged only once in the evolutionary history following an apparently simple divergent evolutionary model. At the same time, on the molecular level, one isofunctional group (L-ribulokinase, AraB) evolved at least two independent solutions that employed distinct specificity-determining residues for the recognition of a same substrate (L-ribulose). Our analysis provides a detailed model of the evolution of the FGGY kinase family. It also shows that only combined molecular and phylogenetic approaches can help reconstruct a full picture of functional diversifications in such diverse families.This study was funded by NIH and DOE grants
    • …
    corecore